A Method of Automatic Hypertext Construction from an Encyclopedic Dictionary of a Specific Field

نویسندگان

  • Sadao Kurohashi
  • Makoto Nagao
  • Satoshi Sato
  • Masahiko Murakami
چکیده

1 Introduction Nowadays, very large volume of texts are created and stored in computer, and as a result the retrieval of texts which fits to a user's demand has become a difficult problem. Hypertext is a typical system to answer this problem , whose primary objective is to establish flexible as-sociative links between relevant text parts and to allow users to select and trace links to see relevant text contents which are connected by links. A difficult problem here is how to construct automatically a network structure in a given set of text data. This paper is concerned with (1) automatic conversion of a plain text set into a hypertext structure, and (2) construction of flexible human interface for the hypertext system. We applied natural language processing methods to locate important conceptual terms in a text corpus and to establish varieties of links between these terms and appropriate text portions. 2 Extraction of thesaurus information The text corpus we handled as a concrete example was the Encyclopedic Dictionary of Computer Science (hereafter abbreviated as EDCS. Iwanami Publ. 1990. En-glish translation will appear soon from Academic Press). It includes 4500 terms and has the text volume of two million Japanese characters (4 Mega bytes). The first part of the term description of EDCS is devoted to synonyms, antonyms, abbreviations and broader concept words. This part has typical sentential styles such as, (ii) A is abbreviated as B. (iv) A stands for B. (v) We call A S {for short}. By finding these sentential patterns the relation between the words A and B is established as follows. (i) p-link is set up from a synonym word to a sentence which defines the synonym relation. (ii) s-link (by synonym) is set up from a defined word to defining words by synonym relation. Typical sentential styles of intensional definition are: (i) A is defined as B. A is regarded as B. (ii) A means B. A connotes B. A is B. By identifying these patterns in a term description part, the relation between the defined word (A) and the definition sentences is established as: (i) p-link is set up from the defined word to the definition sentence when the defined word is not the headword of the term description. This is the case when the defined word is not so important as a headword of the dictionary, and so a rather simple definition description …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Construction of Persian ICT WordNet using Princeton WordNet

WordNet is a large lexical database of English language, in which, nouns, verbs, adjectives, and adverbs are grouped into sets of cognitive synonyms (synsets). Each synset expresses a distinct concept. Synsets are interlinked by both semantic and lexical relations. WordNet is essentially used for word sense disambiguation, information retrieval, and text translation. In this paper, we propose s...

متن کامل

Statistical Analysis for Thesaurus Construction using an Encyclopedic Corpus

Conclusion Discrimination for the hierarchical relation of a word pair using an encyclopedic corpus called the Cyclone corpus In order not to miss an indirect relationship, a semantic expansion technique for descriptions is used The proposed method is able to detect 66.1% of relations Future work Discrimination between hierarchical and synonymous relation PREVIOUS WORK To extract hyponyms, syno...

متن کامل

A New Dictionary Construction Method in Sparse Representation Techniques for Target Detection in Hyperspectral Imagery

Hyperspectral data in Remote Sensing which have been gathered with efficient spectral resolution (about 10 nanometer) contain a plethora of spectral bands (roughly 200 bands). Since precious information about the spectral features of target materials can be extracted from these data, they have been used exclusively in hyperspectral target detection. One of the problem associated with the detect...

متن کامل

BabelNet meets Lexicography: the case of an automatically-built multilingual encyclopedic dictionary

In this paper we provide a first study of the lexicographic quality of BabelNet, a very large automatically-created multilingual encyclopedic dictionary. BabelNet 2.0, available online at http://babelnet. org, covers 50 languages and provides both lexicographic and encyclopedic knowledge for all the open-class parts of speech. It is obtained from the automatic integration of several language re...

متن کامل

Design and Construction of an Automatic pH Adjustment System by Instant Feedback Method

Evaluations of the acidity or alkalinity of solutions are used in many industries such as food industries, medicine, chemical engineering, petrochemical industries, agriculture, animal husbandry, industrial laboratories, etc. Portable and stationary pH measurements are very common and are an integral part of these studies, due to the importance of pH. In most of these evaluations and studies, a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1992